Application Directed Explicit Management for Advanced Cache Architectures

نویسندگان

  • Xi Wang
  • Viktor K. Prasanna
چکیده

In this paper, we demonstrate the effectiveness of application directed explicit cache management. We define the generalized split temporal/spatial cache architecture as an abstraction of several advanced cache architectures. We analyze individual problems, identify the inefficiencies in the memory hierarchy and develop explicit cache management algorithms. In our algorithms, the application software controls hardware mechanisms directly. To illustrate various optimizations, problems are chosen from regular, sparse, data structure and graph applications. Analytical performance estimations are derived for several problems. Simulations show reduced memory traffic and improved average memory access time. For example, in the sparse matrix vector multiplication problem, the average memory access time can be reduced by 21% to 62% over a broad range of cache configurations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

\threads: a System for the Support of Concurrent Programming". Technical Report

Many parallel applications are implemented using lightweight thread packages. The low overhead associated with user-level thread management encourages programmers to use threads to exploit ne-grain parallelism in an application. Although the overhead of explicit thread management can be very small, there is other overhead associated with lightweight threads: the time required to load data into ...

متن کامل

Reducing Code Size with Run-Time Decompression

Compressed representations of programs can be used to improve the code density in embedded systems. Several hardware decompression architectures have been proposed recently. In this paper, we present a method of decompressing programs using software. It relies on using a softwaremanaged instruction cache under control of the decompressor. This is achieved by employing a simple cache management ...

متن کامل

Performance Tuning of the Fast Fourier Transform on a Multi-core Architecture

We are now entering the multi-core era, many multi-core chips are designed and manufactured by various vendors, such as Intel, AMD and Sun etc. IBM Cyclops-64(C64) is a multi-core architecture that provides massive on-chip parallelism, massive on-chip bandwidth, and multiple level memory hierarchy. This type of multi-core architecture presents big challenges to application developers and system...

متن کامل

Structured Parallel Programming and Cache Coherence in Multicore Architectures

It is clear that multicore processors have become the building blocks of today’s high-performance computing platforms. The advent of massively parallel singlechip microprocessors further emphasizes the gap that exists between parallel architectures and parallel programming maturity. Our research group, starting from the experiences on distributed and shared memory multiprocessor, was one of the...

متن کامل

MMU-based software cache and swap mechanisms for smart card operating systems

Modern processor architectures have proven their benefits by providing mechanisms that significantly improve the performance of classical operating systems. Yet, it has not been proven that such processors can be as much relevant to the Smart Card context. To start answering this question, we analyze how the Memory Management Unit of the MIPS 4KSc architecture can be used to build efficient mem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002